Skip to content

Add per-subnet attestation aggregate count coverage to metrics#406

Merged
pablodeymo merged 2 commits into
lambdaclass:mainfrom
conache:add-per-subnet-attestation-aggregate-coverage
Jun 1, 2026
Merged

Add per-subnet attestation aggregate count coverage to metrics#406
pablodeymo merged 2 commits into
lambdaclass:mainfrom
conache:add-per-subnet-attestation-aggregate-coverage

Conversation

@conache
Copy link
Copy Markdown
Contributor

@conache conache commented Jun 1, 2026

🗒️ Description / Motivation

lean_attestation_aggregate_coverage_validators is registered with [section, subnet] labels, but only the subnet=combined total was emitted, while the per-subnet series was reserved in the metric definition and left empty. This PR populates subnet=subnet_N for each subnet alongside the existing combined total, surfacing per-subnet validator coverage in dashboards/alerts.

What Changed

  • crates/blockchain/src/coverage.rscov_record now derives a per-subnet validator count from the seen bitmap (vid % committee_count bucketing, same formula as the existing gossip subnet assignment) and emits one subnet=subnet_<i> gauge per subnet. Removed the resolved TODO(#398) comment.
  • crates/blockchain/src/metrics.rs — updated the lean_attestation_aggregate_coverage_validators description to reflect that both label values are now populated.
  • docs/metrics.md — updated the labels column for the same metric.

Correctness / Behavior Guarantees

  • cov_add is unchanged; the new per-subnet counts are derived at emit time inside cov_record. No signature changes, no caller updates.
  • The new subnet=subnet_<i> series is additive: the existing subnet=combined value is unchanged, so any pre-existing query/alert keeps working.
  • cov_record emits per-subnet values for every i ∈ [0, committee_count), including zero counts — so dashboards see a flat-zero series rather than missing samples when a subnet has no coverage.
  • lean_attestation_aggregate_coverage_subnets (separate metric, counts how many subnets had any coverage) is unchanged.

Cardinality

Series = sections × (subnets + 1) per the issue. With 6 sections (timely, late, block, combined, agg_start_new, proposal_combined) and the typical configured attestation_committee_count (1–4 on current devnets), the metric tops out at ~30 series. Bounded by ATTESTATION_COMMITTEE_COUNT.

Tests Added / Run

None added. Verified manually by running a local devnet and confirming /metrics exposes one subnet=subnet_N series per configured subnet for every section, with the combined-vs-sum-of-subnets invariant holding.

Related Issues / PRs

✅ Verification Checklist

  • Ran make fmt — clean
  • Ran make lint (clippy with -D warnings) — clean
  • Ran cargo test --workspace --release — all passing
  • Ran local devnet and checked the /metrics endpoint's response

@conache conache marked this pull request as ready for review June 1, 2026 15:36
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

This PR closes issue #398 by populating the previously reserved subnet=subnet_N label for lean_attestation_aggregate_coverage_validators, which was defined with a [section, subnet] label set but only ever emitted the subnet=combined aggregate.

  • coverage.rs: cov_record now iterates the seen bitmap after the combined count and buckets each seen validator ID into subnet_counts[vid % cc], emitting one gauge per subnet (including zero-count subnets for completeness). The formula reuses the same vid % committee_count assignment already established by cov_add.
  • metrics.rs / docs/metrics.md: Description and docs updated to reflect that both subnet=combined and subnet=subnet_N labels are now populated; the stale "not yet populated" caveat is removed.

Confidence Score: 5/5

Safe to merge — the change is purely additive, emitting new per-subnet gauge series without touching the existing combined total or any caller signatures.

The new code is a tight, self-contained addition inside cov_record. The vid % cc bucketing formula is already established and documented at the module level, and the zero-count-always-emitted behavior prevents missing-series artifacts in dashboards. The combined gauge is untouched, so no existing query or alert is affected. Documentation and description are consistent with the implementation.

No files require special attention.

Important Files Changed

Filename Overview
crates/blockchain/src/coverage.rs Adds per-subnet gauge emission in cov_record using vid % cc bucketing; formula is consistent with cov_add and the module-level comment about gossip subnet assignment. Zero-count subnets are always emitted, preventing missing-series gaps in dashboards.
crates/blockchain/src/metrics.rs Description-only update to lean_attestation_aggregate_coverage_validators; removes the stale 'not yet populated' caveat and documents both label values. No functional change.
docs/metrics.md Labels column updated to show subnet=combined,subnet_0,…,subnet_N-1; accurate and consistent with the code change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cov_record called\nsection, seen, has_subnet] --> B[emit subnet=combined\ncount of all seen validators]
    B --> C{cc = has_subnet.len\ncc > 0?}
    C -- No --> E[emit lean_attestation_aggregate_coverage_subnets]
    C -- Yes --> D[build subnet_counts vec of length cc\nfor each seen vid: subnet_counts at vid mod cc += 1]
    D --> F[for each i in 0..cc\nemit subnet=subnet_i gauge]
    F --> E
Loading

Reviews (1): Last reviewed commit: "Add per-subnet attestation aggreagte cou..." | Re-trigger Greptile

@pablodeymo pablodeymo merged commit 7c3cd6a into lambdaclass:main Jun 1, 2026
@MegaRedHand MegaRedHand changed the title Add per-subnet attestation aggreagte count coverage to metrics Add per-subnet attestation aggregate count coverage to metrics Jun 1, 2026
MegaRedHand pushed a commit that referenced this pull request Jun 3, 2026
…ction (#414)

## 🗒️ Description / Motivation

Ports the five `lean_block_proposal_*` observability metrics from
[leanSpec PR #753](leanEthereum/leanSpec#753)
into the ethlambda block builder.

These metrics give cross-client visibility into the block-proposal
attestation-selection path (`build_block`): how long each phase takes,
how many proposal builds run, how many child payloads are greedily
consumed, and how many distinct `AttestationData` / aggregated proofs
end up in the proposed block. They align with [zeam
#914](blockblaz/zeam#914
`getProposalAttestations` instrumentation and the leanSpec naming so the
[leanMetrics](https://github.com/leanEthereum/leanMetrics) dashboards
work across clients.

## What Changed

**`crates/blockchain/src/metrics.rs`** — five new metrics registered
with the existing `LazyLock` + `register_*!` pattern, a
`BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES` label constant, registration
in `init()`, public API functions, and a unit test:

| Metric | Type | Buckets / Labels |
|--------|------|------------------|
| `lean_block_proposal_attestation_build_phase_seconds` | HistogramVec |
`phase` = `select_payloads`, `compact`, `stf_simulate`; buckets
`0.001…8` |
| `lean_block_proposal_attestation_builds_total` | Counter | one per
proposal attempt |
| `lean_block_proposal_child_payloads_consumed_total` | Counter |
greedily-picked proofs before compaction |
| `lean_block_proposal_attestation_data_selected` | Histogram | buckets
`0, 1, 2, 4, 8, 16, 32` |
| `lean_block_proposal_aggregates_selected` | Histogram | buckets `0, 1,
2, 4, 8, 16, 32, 64, 128` |

**`crates/blockchain/src/block_builder.rs`** — instruments
`build_block`: times the `select_attestations`, `compact_attestations`,
and STF (`process_slots` + `process_block`) phases, and emits the
counters/histograms after a successful build.

**`docs/metrics.md`** — documents all five in the Block Production
Metrics table.

## Correctness / Behavior Guarantees

- **No behavior change.** Only metric observations were added around
existing logic; block contents, selection order, and the
state-transition path are untouched.
- **Architectural divergence from leanSpec, documented.** leanSpec
re-runs the STF inside a fixed-point loop and observes `stf_simulate`
per round. ethlambda projects justification/finalization incrementally
during selection and runs the STF exactly **once** at the end, so its
`stf_simulate` is a single observation per build. This is noted on the
`BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES` doc comment, consistent with
the upstream PR's own caveat that phase timings are not directly
comparable across clients.
- `attestation_data_selected` and `aggregates_selected` are observed
from the post-compaction body (one merged proof per distinct
`AttestationData`), matching the spec's intent.
- Metrics are only emitted on a **successful** build; a build that
errors out in the STF is already counted by the existing
`lean_block_building_failures_total`.

## Tests Added / Run

- New unit test
`metrics::tests::block_proposal_attestation_build_metrics_are_usable` —
verifies the phase metric registers and accepts every label in
`BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES`, and that the companion
counters/histograms are callable. Guards against drift between the label
constant and the strings passed at the `build_block` call sites.
- Existing `block_builder` tests (`build_block_*`,
`compact_attestations_*`, `extend_proofs_greedily_*`) now exercise the
new metric paths and pass unchanged.

Commands run:
- `make fmt` — clean
- `cargo clippy -p ethlambda-blockchain --all-targets -- -D warnings` —
clean
- `cargo test -p ethlambda-blockchain --lib` — 23 passing

## Related Issues / PRs

- Ports
[leanEthereum/leanSpec#753](leanEthereum/leanSpec#753)
- Related to
[blockblaz/zeam#914](blockblaz/zeam#914)
- Follows the metrics pattern from #406 (per-subnet attestation
aggregate coverage)

## ✅ Verification Checklist

- [x] Ran `make fmt` — clean
- [x] Ran `make lint` (clippy with `-D warnings`) — clean
- [ ] Ran `cargo test --workspace --release` — only
`ethlambda-blockchain` lib suite run (23 passing); full workspace
release run not yet executed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emit per-subnet attestation aggregate coverage

2 participants